105 research outputs found

    An study of the effect of process malleability in the energy efficiency on GPU‑based clusters

    Get PDF
    The adoption of graphic processor units (GPU) in high-performance computing (HPC) infrastructures determines, in many cases, the energy consumption of those facilities. For this reason, an efficient management and administration of the GPU-enabled clusters is crucial for the optimum operation of the cluster. The main aim of this work is to study and design efficient mechanisms of job scheduling across GPU-enabled clusters by leveraging process malleability techniques, able to reconfigure running jobs, depending on the cluster status. This paper presents a model that improves the energy efficiency when processing a batch of jobs in an HPC cluster. The model is validated through the MPDATA algorithm, as a representative example of stencil computation used in numerical weather prediction. The proposed solution applies the efficiency metrics obtained in a new reconfiguration policy aimed at job arrays. This solution allows the reduction in the processing time of workloads up to 4.8 times and reduction in the energy consumption up to 2.4 times the cluster compared to the traditional job management, where jobs are not reconfigured during their execution

    Un gestor de GPUs remotas para clusters HPC

    Get PDF
    Treball de Fi de Màster en Sistemes Intel.ligents (Pla de 2013). Codi: SIU043. Curs 2013-2014SLURM es un gestor de recursos para clusters que permite compartir una serie de recursos heterogéneos entre los trabajos en ejecución. Sin embargo, SLURM no está diseñado para compartir recursos como los procesadores gráficos (GPUs). De hecho, aunque SLURM admita plug-ins de recursos genéricos para poder manejar GPUs, éstas sólo pueden ser accedidas de forma exclusiva por un trabajo en ejecución del nodo que las hospeda. Esto es un serio inconveniente para las tecnologias de virtualización de GPUs remotas, cuya misión es proporcionar al usuario un acceso completamente transparente a todas las GPUs del cluster, independientemente de la ubicación concreta, tanto del trabajo como de la GPU. En este trabajo presentamos un nuevo tipo de dispositivo en SLURM, “rgpu”, para conseguir que una aplicación desde su nodo acceda a cualquier GPU del cluster haciendo uso de la tecnología de virtualización de GPUs remotas, rCUDA. Además, con este nuevo mecanismo de planificación, un trabajo puede utilizar tantas GPUs como existan en el cluster, siempre y cuando estén disponibles. Finalmente, presentamos los resultados de varias simulaciones que muestran los beneficios de este nuevo enfoque, en términos del incremento de la flexibilidad de planificación de trabajos

    Accessible C-programming course from scratch using a MOOC platform without limitations

    Full text link
    [EN] The C language has been used for ages in the application development in multidisciplinary environments. However, in the academia, this language is being replaced by other higher-level languages due to they are easier to understand, learn and apply. Moreover, the necessity of professionals with a good knowledge in those high-level languages is constantly increasing because of the boosting of mobile devices. This scenario generates a lack of low-level language programmers, required in other less trendy fields, but equal or more important, such as science, engineering or research.  In order to revive the interest in low-level languages and provide those minority fields with well-prepared staff, we present in this work a MOCC C-programming course that is addressed to any kind of people with or without IT background. A feature that differentiates this course from others programming online-based courses is that we mainly focus on the C language syntax providing, via a self-tuned virtual machine, an encapsulated environment that hides any interaction with the command-line of the underlying operating system. A secondary target of this work is to foster the computer science degree students to enrol the computer architecture specialization at the Universitat Jaume I (Spain). For this purpose, the High Performance Computing and Architectures research group of that University has decided to use this C course as a tool for fulfill the gap of the current syllabus. The results show that half of the participants that completed the first session of the course have satisfactorily finished the course, and the number of computer science degree students that chose the computer architecture specialization the following academic course was increment by 3x.This research has been partly funded by TIN2017-82972-R. Adrián Castelló was supported by the ValI+D 2015 FPI program of the Generalitat Valenciana.http://ocs.editorial.upv.es/index.php/HEAD/HEAD18Castelló, A.; Iserte, S.; Belloch, JA. (2018). Accessible C-programming course from scratch using a MOOC platform without limitations. Editorial Universitat Politècnica de València. 1197-1204. https://doi.org/10.4995/HEAD18.2018.8176OCS1197120

    On the use of deep learning and computational fluid dynamics for the estimation of uniform momentum source components of propellers

    Get PDF
    This article proposes a novel method based on Deep Learning for the resolution of uniform momentum source terms in the Reynolds-Averaged Navier-Stokes equations. These source terms can represent several industrial devices (propellers, wind turbines, and so forth) in Computational Fluid Dynamics simulations. Current simulation methods require huge computational power, rely on strong assumptions or need additional information about the device that is being simulated. In this first approach to the new method, a Deep Learning system is trained with hundreds of Computational Fluid Dynamics simulations with uniform momemtum sources so that it can compute the one representing a given propeller from a reduced set of flow velocity measurements near it. Results show an overall relative error below the for momentum sources for uniform sources and a moderate error when describing real propellers. This work will allow to simulate more accurately industrial devices with less computational cost.Authors would like to thank the University Jaume I for the project UJI-B2021-70 and the spanish Agencia Estatal de Investigación for the project PID2021-128405OB-I00. Researcher S. Iserte was supported by the postdoctoral fellowship APOSTD/2020/026 from Valencian Region Government and European Social Funds. J. Luis-Gómez is supported by FPU21/03740 doctoral grant from the Spanish Ministry of Universities.Peer ReviewedPostprint (author's final draft

    A Distributed Mesh Generation Study Case through a Customizable Platform as a Service Framework

    Get PDF
    Conferencia presentada en 11th International Conference on Simulation and Modeling Methodologies, Technologies and Applications - SIMULTECH 2021The quality of a mesh can determine the accuracy of a Computational Fluid Dynamics (CFD) simulation. In fact, meshing is not only a user highly time-consuming endeavor but also demands a lot of computational power. The need for powerful and useful tools for meshing can have a real impact on productivity and the final result. In this paper, a customizable platform as a service for meshing, named Evoker, is presented and evaluated to assist users to work over different types of geometries and accelerate the generation of meshes. Evoker is a zero-installation tool with a web Graphical User Interface (Web-GUI), which cloud-server runs OpenFOAM in order to provide a friendly interface to its meshing utilities. Evoker also manages cloud computing resources to distribute the mesh generation among different processors. Through the presented use case, Evoker demonstrates to be a versatile meshing solution that can help to save a lot of time for their users

    Improving the management efficiency of GPU workloads in data centers through GPU virtualization

    Full text link
    [EN] Graphics processing units (GPUs) are currently used in data centers to reduce the execution time of compute-intensive applications. However, the use of GPUs presents several side effects, such as increased acquisition costs and larger space requirements. Furthermore, GPUs require a nonnegligible amount of energy even while idle. Additionally, GPU utilization is usually low for most applications. In a similar way to the use of virtual machines, using virtual GPUs may address the concerns associated with the use of these devices. In this regard, the remote GPU virtualization mechanism could be leveraged to share the GPUs present in the computing facility among the nodes of the cluster. This would increase overall GPU utilization, thus reducing the negative impact of the increased costs mentioned before. Reducing the amount of GPUs installed in the cluster could also be possible. However, in the same way as job schedulers map GPU resources to applications, virtual GPUs should also be scheduled before job execution. Nevertheless, current job schedulers are not able to deal with virtual GPUs. In this paper, we analyze the performance attained by a cluster using the remote Compute Unified Device Architecture middleware and a modified version of the Slurm scheduler, which is now able to assign remote GPUs to jobs. Results show that cluster throughput, measured as jobs completed per time unit, is doubled at the same time that the total energy consumption is reduced up to 40%. GPU utilization is also increased.Generalitat Valenciana, Grant/Award Number: PROMETEO/2017/077; MINECO and FEDER, Grant/Award Number: TIN2014-53495-R, TIN2015-65316-P and TIN2017-82972-RIserte, S.; Prades, J.; Reaño González, C.; Silla, F. (2021). Improving the management efficiency of GPU workloads in data centers through GPU virtualization. Concurrency and Computation: Practice and Experience. 33(2):1-16. https://doi.org/10.1002/cpe.5275S11633

    DPU Offloading Programming with the OpenMP API

    Get PDF
    Data processing units (DPUs) as network co-processors are an emerging trend in our community, with plenty of opportunities yet to be explored. These have been generally used as domain-specific accelerators transparent to application developers; In the HPC field, DPUs have been used as MPI accelerators, but also to offload some tasks from the general-purpose processor. However, the latter required application developers to deploy MPI ranks in the DPUs, as if they were remote (weak) compute nodes, hence considerably hindering programmability. The wide adoption of OpenMP as the threading model in the HPC arena, along with that of GPU accelerators, is making OpenMP offloading to GPUs a wide trend for HPC applications. In this paper we introduce, for the first time in the literature, OpenMP offloading support for network co-processor DPUs. We present our design in LLVM to support OpenMP standard offloading semantics and discuss the programming productivity advantages with respect to the existing MPI-based programming model. We also provide the corresponding performance analysis demonstrating competitive results in comparison with the MPI baseline.The authors would like to express their sincere gratitude to Gilad Shainer, Richard Graham, Gil Bloch, and Yong Qin for their support, insightful comments, and suggestions. The research producing this paper received funding from NVIDIA and the Ministerio de Ciencia e Innovación—Agencia Estatal de Investigación (PCI2021-121958). The authors are grateful for the support from the Department of Research and Universities of the Government of Catalonia to the AccMem (Code: 2021 SGR 00807). Antonio J. Peña was partially supported by the Ramón y Cajal fellowship RYC2020-030054-I funded by MCIN/AEI/10.13039/501100011033 and, by “ESF Investing in your future”Peer ReviewedPostprint (author's final draft

    Accelerating urban scale simulations leveraging local spatial 3D structure

    Full text link
    [EN] This paper presents a hybrid methodology for accelerating Computational Fluid Dynamics (CFD) simulations intertwining inferences from deep neural networks (DNN). The strategy leverages the local spatial data of the velocity field to leverage three-dimensional convolutional kernels within DNN. The hybrid workflow is composed of two-step cycles where CFD solvers calculations are utilized to feed predictive models, whose inferences, in turn, accelerate the simulation of the fluid evolution compared with traditional CFD. This approach has proved to reduce 30% time-to-solution in an urban scale study case, which leads to generating massive datasets at a fraction of the cost.Researcher S. Iserte was supported by postdoctoral fellowship APOSTD/2020/026 from GVA-ESF. While researcher A. Macias was supported by predoctoral fellowship FDGENT from GVA. CTE-Power cluster of the Barcelona Supercomputing Center, and Tirant III cluster of the Servei d'Informatica of the University of Valencia were leveraged in this research. Authors want to thank the anonymous reviewers whose suggestions significantly improved the quality of this manuscript.Iserte, S.; Macías, A.; Martínez-Cuenca, R.; Chiva, S.; Paredes Palacios, R.; Quintana-Ortí, ES. (2022). Accelerating urban scale simulations leveraging local spatial 3D structure. Journal of Computational Science. 62:1-11. https://doi.org/10.1016/j.jocs.2022.1017411116

    Dynamic Management of Resource Allocation for OmpSs Jobs

    Get PDF
    Proceedings of the First PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD 2016) Timisoara, Romania. February 8-11, 2016.The main purpose of this thesis is to research in the relation between task-based programming models and resource management systems in order to provide a smart autonomous load-balancing and fault-tolerant system. Thus, taking advantage of MPI malleable applications and execution models such as SMPD and MPMD we will dig in the principle of the dynamical reconfiguration. Apart from providing an overview of the thesis idea, this paper explains our initial motivation and reviews briefly the most remarkable work done in this field.European Cooperation in Science and Technology. COSTThis work is partially supported by EU under the COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS); and the Project TIN2014-53495-R from MINECO and FEDER

    Dynamic spawning of MPI processes applied to malleability

    Get PDF
    Malleability allows computing facilities to adapt their workloads through resource management systems to maximize the throughput of the facility and the efficiency of the executed jobs. This technique is based on reconfiguring a job to a different resource amount during execution and then continuing with it. One of the stages of malleability is the dynamic spawning of processes in execution time, where different decisions in this stage will affect how the next stage of data redistribution is performed, which is the most time-consuming stage. This paper describes different methods and strategies, defining eight different alternatives to spawn processes dynamically and indicates which one should be used depending on whether a strong or weak scaling application is being used. In addition, it is described for both types of applications which strategies benefit most the application performance or the system productivity. The results show that reducing the number of spawning processes by reusing the older ones can reduce reconfiguration time compared to the classical method by up to 2.6 times for expanding and up to 36 times for shrinking. Furthermore, the asynchronous strategy requires analysing the impact of oversubscription on application performance.This work has been funded by the following projects: project PID2020-113656RB-C21 supported by MCIN/AEI/10.13039/501100011033 and project UJI-B2019-36 supported by UniversitatJaume I. Researcher S. Iserte was supported by the postdoctoralfellowship APOSTD/2020/026, and researcher I. Martín- Álvarez was supported by the predoctoral fellowship ACIF/2021/260, both from Valencian Region Government and European Social Funds.Peer ReviewedPostprint (author's final draft
    corecore